Judging Grammaticality with Tree Substitution Grammar Derivations

نویسنده

  • Matt Post
چکیده

In this paper, we show that local features computed from the derivations of tree substitution grammars — such as the identify of particular fragments, and a count of large and small fragments — are useful in binary grammatical classification tasks. Such features outperform n-gram features and various model scores by a wide margin. Although they fall short of the performance of the hand-crafted feature set of Charniak and Johnson (2005) developed for parse tree reranking, they do so with an order of magnitude fewer features. Furthermore, since the TSGs employed are learned in a Bayesian setting, the use of their derivations can be viewed as the automatic discovery of tree patterns useful for classification. On the BLLIP dataset, we achieve an accuracy of 89.9% in discriminating between grammatical text and samples from an n-gram language model.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Judging Grammaticality with Count-Induced Tree Substitution Grammars

Prior work has shown the utility of syntactic tree fragments as features in judging the grammaticality of text. To date such fragments have been extracted from derivations of Bayesianinduced Tree Substitution Grammars (TSGs). Evaluating on discriminative coarse and fine grammaticality classification tasks, we show that a simple, deterministic, count-based approach to fragment identification per...

متن کامل

D-Tree Substitution Grammars

There is considerable interest among computational linguists in lexicalized grammatical frameworks; lexicalized tree adjoining grammar (LTAG) is one widely studied example. In this paper, we investigate how derivations in LTAG can be viewed not as manipulations of trees but as manipulations of tree descriptions. Changing the way the lexicalized formalism is viewed raises questions as to the des...

متن کامل

Language Modeling with Tree Substitution Grammars

We show that a tree substitution grammar (TSG) induced with a collapsed Gibbs sampler results in lower perplexity on test data than both a standard context-free grammar and other heuristically trained TSGs, suggesting that it is better suited to language modeling. Training a more complicated bilexical parsing model across TSG derivations shows further (though nuanced) improvement. We conduct an...

متن کامل

Eecient Disambiguation by Means of Stochastic Tree Substitution Grammars

In Stochastic Tree Substitution Grammars (STSGs), one parse(tree) of an input sentence can be generated by exponentially many derivations ; the probability of a parse is deened as the sum of the probabilities of its derivations. As a result, some methods of Stochastic Context-Free Grammars (SCFGs), e.g. the Viterbi algorithm for nding the most probable parse (MPP) of an input sentence, are not ...

متن کامل

A General, Sound and Efficient Natural Language Parsing Algorithm based on Syntactic Constraints Propagation

This paper presents a new context-free parsing algorithm based on a bidirectional strictly horizontal strategy which incorporates strong top–down predictions (derivations and adjacencies). From a functional point of view, the parser is able to propagate syntactic constraints reducing parsing ambiguity. From a computational perspective, the algorithm includes different techniques aimed at the im...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011